23 research outputs found

    TARUC: A Topology-Aware Resource Utility and Contention Benchmark

    Get PDF
    Computer architects have increased hardware parallelism and power efficiency by integrating massively parallel hardware accelerators (coprocessors) into compute systems. Many modern HPC clusters now consist of multi-CPU nodes along with additional hardware accelerators in the form of graphics processing units (GPUs). Each CPU and GPU is integrated with system memory via communication links (QPI and PCIe) and multi-channel memory controllers. The increasing density of these heterogeneous computing systems has resulted in complex performance phenomena including nonuniform memory access (NUMA) and resource contention that make application performance hard to predict and tune. This paper presents the Topology Aware Resource Usability and Contention (TARUC) benchmark. TARUC is a modular, open-source, and highly configurable benchmark useful for profiling dense heterogeneous systems to provide insight for developers who wish to tune application codes for specific systems. Analysis of TARUC performance profiles from a multi-CPU, multi-GPU system is also presented

    GPUMap: A transparently GPU-accelerated Python map function

    Get PDF

    Enhancing regional ocean modeling simulation performance with the Xeon Phi architecture

    Get PDF
    Ocean studies are crucial to many scientific disciplines. Due to the difficulty in probing the deep layers of the ocean and the scarcity of data in some of the oceans, the scientific community relies heavily on ocean simulation models. Ocean modeling is complex and computationally intensive, and improving the performance of these models will greatly advance and improve the work of ocean scientists. This paper presents a detailed exploration of the acceleration of the Regional Ocean Model System (ROMS) software with the latest Intel Xeon Phi x200 architectures. Both shared-memory and distributed-memory parallel computing models are evaluated. Results show run time improvements of nearly a factor of 16 compared to a serial implementation. Further experiments and optimizations, including the use of a GPU acceleration model, are discussed and results are presented

    Cross Teaching Parallelism and Ray Tracing: A Project-based Approach to Teaching Applied Parallel Computing

    Get PDF
    Massively parallel Graphics Processing Unit (GPU) hardware has become increasingly powerful, available and affordable. Software tools have also advanced to the point that programmers can write general purpose parallel programs that take advantage of the large number of compute cores available in the hardware. With literally hundreds of compute cores available on a single device, program performance can increase by orders of magnitude. We believe that introducing students to the concepts of parallel programming for massively parallel hardware is of increasing importance in an undergraduate computer science curriculum. Furthermore, we believe that students learn best when given projects that reflect real problems in computer science. This paper describes the experience of integrating two undergraduate computer science courses to enhance student learning in parallel computing concepts. In this cross teaching experience we structured the integration of the courses such that students studying parallel computing worked with students studying advanced rendering for approximately 30% of the quarter long courses. Working in teams on a joint project, both groups of students were able to see the application of parallelization to an existing software project with both the benefits and complications exposed early in the curriculum of both courses. Motivating projects and performance gains are discussed, as well as student survey data on the effectiveness of the learning outcomes. Both performance and survey data indicate a positive gain from the cross teaching experience

    A heterogeneous compute solution for optimized genomic selection analysis

    Get PDF
    This paper presents a heterogeneous computing solution for an optimized genetic selection analysis tool, GenSel. GenSel can be used to efficiently infer the effects of genetic markers on a desired trait or to determine the genomic estimated breeding values (GEBV) of genotyped individuals. To predict which genetic markers are informational, GenSel performs Bayesian inference using Gibbs sampling, a Markov Chain Monte Carlo (MCMC) algorithm. Parallelizing this algorithm proves to be a technically challenging problem because there exists a loop carried dependence between each iteration of the Markov chain. The approach presented in this paper exploits both task-level parallelism (TLP) and data-level parallelism (DLP) that exists within each iteration of the Markov chain. More specifically, a combination of CPU threads using OpenMP and GPU threads using NVIDIA\u27s CUDA paradigm is implemented to speed up the sampling of each genetic marker used in creating the model. Performance speedup will allow this algorithm to accommodate the expected increase in observations on animals and genetic markers per observation. The current implementation executes 1.84 times faster than the optimized CPU implementation

    High Performance Regional Ocean Modeling with GPU Acceleration

    Get PDF
    The Regional Ocean Modeling System (ROMS) is an open-source, free-surface, primitive equation ocean model used by the scientific community for a diverse range of applications [1]. ROMS employs sophisticated numerical techniques, including a split-explicit time-stepping scheme that treats the fast barotropic (2D) and slow baroclinic (3D) modes separately for improved efficiency [2]. ROMS also contains a suite of data assimilation tools that allow the user to improve the accuracy of a simulation by incorporating observational data. These tools are based on four dimensional variational methods [3], which generate reliable results, but require more computational resources than without any assimilation of data. The implementation of ROMS supports two parallel computing models; a distributed memory model that utilizes Message Passing Interface (MPI), and a shared memory model that utilizes OpenMP. Prior research has shown that portions of ROMS can also be executed on a General Purpose Graphics Processing Unit (GPGPU) to take advantage of the massively parallel architecture available on those systems [4]. This paper presents a comparison between two forms of parallelism. NVIDIA Kepler K20X GPUs were used for performance measurement of GPU parallelism using CUDA while an Intel Xeon E5-2650 was used for shared memory parallelism using OpenMP. The implementation is benchmarked using idealistic marine conditions. Our experiments show that OpenMP was the fastest, followed closely by CUDA, while the normal serial version was considerably slower

    Twill: A hybrid microcontroller-FPGA framework for parallelizing single-threaded C programs

    Get PDF
    Increasingly System-On-A-Chip platforms which incorporate both microprocessors and re-programmable logic are being utilized across several fields ranging from the automotive industry to network infrastructure. Unfortunately, the development tools accompanying these products leave much to be desired, requiring knowledge of both traditional embedded systems languages like C and hardware description languages like Verilog. We propose to bridge this gap with Twill, a truly automatic hybrid compiler that can take advantage of the parallelism inherent in these platforms. Twill can extract long-running threads from single threaded C code and distribute these threads across the hardware and software domains to more fully utilize the asymmetric characteristics between processors and the embedded reconfigurable logic fabric. We show that Twill provides a significant performance increase on the CHStone benchmarks with an average 1.63 times increase over the pure hardware approach and an increase of 22.2 times on average over the pure software approach while in general decreasing the area required by the reconfigurable logic compared to the pure hardware approach

    The Presentation of Temperature Information in Television Broadcasts: What is Normal?

    Get PDF
    In a typical weather broadcast, observed daily temperature information such as maximum and minimum temperatures are shown and compared to the daily average or “normal”. Such information, however, does not accurately describe whether or not that particular day is fairly typical for that time of year or truly an unusual occurrence. Thus it is suggested that the presentation of temperature information can be augmented with elementary statistical information in order to give a more meaningful presentation of temperature information without the need to explain the basis of such statistical information. A study of the climatological maximum and minimum temperatures over a 30-year period for Columbia, Missouri is performed in order to provide the rationale for displaying a "typical" temperature range. This information was incorporated into television weather broadcasts at KOMU TV-8, which is the campus television station and local NBC affiliate

    Genetic Testing to Inform Epilepsy Treatment Management From an International Study of Clinical Practice

    Get PDF
    IMPORTANCE: It is currently unknown how often and in which ways a genetic diagnosis given to a patient with epilepsy is associated with clinical management and outcomes. OBJECTIVE: To evaluate how genetic diagnoses in patients with epilepsy are associated with clinical management and outcomes. DESIGN, SETTING, AND PARTICIPANTS: This was a retrospective cross-sectional study of patients referred for multigene panel testing between March 18, 2016, and August 3, 2020, with outcomes reported between May and November 2020. The study setting included a commercial genetic testing laboratory and multicenter clinical practices. Patients with epilepsy, regardless of sociodemographic features, who received a pathogenic/likely pathogenic (P/LP) variant were included in the study. Case report forms were completed by all health care professionals. EXPOSURES: Genetic test results. MAIN OUTCOMES AND MEASURES: Clinical management changes after a genetic diagnosis (ie, 1 P/LP variant in autosomal dominant and X-linked diseases; 2 P/LP variants in autosomal recessive diseases) and subsequent patient outcomes as reported by health care professionals on case report forms. RESULTS: Among 418 patients, median (IQR) age at the time of testing was 4 (1-10) years, with an age range of 0 to 52 years, and 53.8% (n = 225) were female individuals. The mean (SD) time from a genetic test order to case report form completion was 595 (368) days (range, 27-1673 days). A genetic diagnosis was associated with changes in clinical management for 208 patients (49.8%) and usually (81.7% of the time) within 3 months of receiving the result. The most common clinical management changes were the addition of a new medication (78 [21.7%]), the initiation of medication (51 [14.2%]), the referral of a patient to a specialist (48 [13.4%]), vigilance for subclinical or extraneurological disease features (46 [12.8%]), and the cessation of a medication (42 [11.7%]). Among 167 patients with follow-up clinical information available (mean [SD] time, 584 [365] days), 125 (74.9%) reported positive outcomes, 108 (64.7%) reported reduction or elimination of seizures, 37 (22.2%) had decreases in the severity of other clinical signs, and 11 (6.6%) had reduced medication adverse effects. A few patients reported worsening of outcomes, including a decline in their condition (20 [12.0%]), increased seizure frequency (6 [3.6%]), and adverse medication effects (3 [1.8%]). No clinical management changes were reported for 178 patients (42.6%). CONCLUSIONS AND RELEVANCE: Results of this cross-sectional study suggest that genetic testing of individuals with epilepsy may be materially associated with clinical decision-making and improved patient outcomes
    corecore